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Abstract 

Proportional transaction costs present difficult theoretical prob- 
lems in trading algorithm design, on account of their lack of analytical 
tractability. The author derives a solution of DT-NT-DT form for an 
arbitrary model in which the the traded asset has diffusive dynamics 
described by one or more stochastic risk factors. The width of the NT 
zone is found to be, as expected, proportional to the cube root of the 
transaction cost. It is also proportional to the | power of the volatil- 
ity of the target position, thereby causing a faster trading strategy to 
be buffered more than a slower one. The displacement of the middle 
of the buffer from the costfree position is found to be proportional to 
the square of the width, and hence to the | power of the transaction 
cost; the proportionality constant depends on the expected short-term 
change in position. 



1 Introduction 

In this paper we consider the effect of proportional transaction costs on the 
trading of an arbitrary 'synthetic' asset that follows a diffusion process whose 
drift and volatility terms are driven by stochastic factors. The tradable asset 
is Xt which in real probability measure evolves according to 

dX t = n x (Z t ) dt + ax(Zt) dW 0jt (1) 



1 The term 'linear' costs often refers to the presence of fixed per-ticket cost and also a 
proportional part generated by a bid-offer independent of the trade size. As we are not 
considering a fixed part, we use the term 'proportional', whereas in [9] the term 'linear' 
was used for the same thing. 
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in which \ix and ax, which describe the drift and volatilitjo of Xt, are 
smootfjl functions, and Zt a vector of factors each component of which 
follows its own ergodic process: 

dZ itt = (Z t ) dt + a Zl {Zt) dW ht . (2) 

Call the position that one would trade to in the absence of costs the 'target 
position'. In the presence of proportional transaction costs, one cannot 
simply follow the optimal costfree strategy, as to do so would lose money at 
an infinite rate. Instead, a 'buffer' must be drawn around the target position, 
defining a no-trade (NT) zone in which the position is held unchanged and 
on either side a discrete-trade (DT) zone in which one trades immediately 
to the edge of the NT zon^l The effect of the NT zone is to prevent the 
strategy trading backwards and forwards in small amounts: typically the 
action is to execute a succession of small trades in one direction, then wait, 
then reverse. One must find the optimal buffer width. Too narrow, and one 
loses too much in costs by overtrading; too wide, and trading is performed 
so rarely that no revenue is generated. 

In a recent paper [9], a very specific one-dimensional example of ([I]) is 
considered, in which Z\ = X, and the exact shape of the optimal buffer 
(i.e. where its ends should be, as a function of Xt) is derived in terms of 
some functions relating to fix , &x ■ Analysis of this solution by Taylor series 
shows that in the limit of small costs the buffer width is, at leading order, 
proportional to the cube root of the cost e of trading one unit of asset, and 
the constant of proportionality is given (see eq. [7J. It is also shown that the 
middle of the buffer is not quite at the target position (see eq. E|) , but this 
is a smaller effect. In this paper, we extend to the multifactor case ([1]). We 
first present an heuristic argument for the results. In the Appendix we give 
a full derivation the key idea to which is the reduction of the problem to a 
one-dimensional system of the same type as [9] . We thereby obtain a similar 
optimal trading rule, given by (|9ll0p . and demonstrate via model examples 
using synthesised and real data that it works quite well. 

2 Absolute, rather than lognormal, volatility. 

3 Continuous first derivative; though mild singularities, such as that of \z\ at z — 0, are 
not in fact a problem. We also need ax > 0. 

4 Note incidentally that the DT-NT-DT form of solution is not needed in the effectively 
unrelated, and simpler, problem of purely quadratic costs, i.e. total cost oc (trade size) 
with no proportional or fixed terms. In that setup one instead trades continuously forwards 
a target position. As pointed out by Garleanu and Pedersen [3], with purely quadratic 
costs, when one determines what weight should be attached to all the various signals at 
one's disposal, the fast signals receive only a low weight as the slowness of trading causes 
them to be smoothed out. 
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Background 



We can think of Xt being a futures price, a combination of these, or more 
generally some kind of swap PV, and so it is allowed to be negative; there 
is no term corresponding to the risk-free interest rate in its drift, because 
it is not the PV of a cash assetH. Thus for example rather than trading 
stocks and cash bonds, we are taking risk through their synthetic analogues, 
namely stock futures, bond futures, interest rate swaps and CDS. (For a 
general background to this setup, and to transaction costs in general, the 
reader is referred to [9].) 

The factors represented by Z t may be exogenous (specified by the mod- 
eller, as for example macroeconomic ones) or endogenous (intrinsic to the 
time series of Xt, as for example momentum), and for these purposes it does 
not matter which, but there are some subtleties of interpretation that are 
more conveniently discussed later. 

The ergodicity assumption is a subtle one: although there is no explicit 
requirement for the factors to be mean-reverting, it is usually necessary 
when defining a time series model for a realisation-average, i.e. an average 
over a probability distribution, to be interchanged with a time-average. This 
implicitly happens when the model is traded: the effect of making decisions 
that are 'on average correct' (=realisation average) becomes apparent as 
time progresses. It also occurs when a model is back-tested, as one hopes 
that a reasonable coverage of scenarios (realisation average) has been ob- 
tained in the course of history (time average). The universally-employed 
technique of standardising a factor, that is, subtraction of its unconditional 
mean followed by division by its unconditional standard deviation, naturally 
causes it to become mean-reverting. There is no implication in this paper 
that Xt itself is mean-reverting, as the functions [ix, o~x are arbitrary. 

Optimality, objective, and utility 

Our construction is 'optimal', in the sense of maximising the value functional 



where Ot is the positions in Xt at time t and IA denotes the utility of in- 
stantaneous changes in P&L. As the dynamics are diffusive, without jumps, 

5 So in the risk-neutral world it would have zero drift. 

6 The treatment here is incidentally along the same lines as [5J. 

7 Number of lots, for futures contracts; notional, for OTC contracts such as swaps 




(3) 



3 



we only require IA and its first two derivatives at the origin. We stipulate 
W(0) = 0, U'(0) = 1, W"(0) = -1/G, so that G, which is fixed, is a measure 
of risk appetite and has units of money (because U does, in our formulation). 
Then we can recast © as 



V, = E, 



-r(s-t) 



2G 



ds 



(4) 



which makes it clear that the objective function rewards upward drift, and 
penalises volatility, in P&L. We seek an optimal solution Ot that depends 
only on the factors at time t and the current position 8^, writing t = 

In the absence of transaction costs the optimal position is simply 



9o(Z t ) 



(5) 



vx(Z t ) 2 

so that one trades to this target position without reference to the current 
position. (The o in go indicates the transaction-free solution, and the * de- 
notes optimality.) This is of the familiar form "expected return -f- variance, 
x gearing factor" (see also the derivation in [1, §14]). 



Interpretation of the gearing factor G 

The gearing factor expresses the utility of risk, and can be understood by 
appeal to volatility of P&L. The expected return in a time interval dt is 
Ot^x dt and the variance of the P&L is 9% o\ dt. By (j^J), the mean P&L, the 
variance of the P&L, and the Sharpe ratio for a time period [0,T] are 
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(6) 

So the standard deviation (stdev) of the P&L is simply xtG. The interpre- 
tation of G is now clear: it is the desired stdev of P&L, per unit Sharpe. 
Suppose that a strategy for trading a particular asset has an (estimated) 
annual Sharpe ratio of 0.8 and the desired annual P&L stdev is $60M. Then 
the gearing is set to G = $75M. 

Another interpretation of G Lagrange multiplier. Consider the 

usual optimisation: Maximise return subject to risk < some value Q say. If 
we understand risk to be quadratic variation then the Lagrangian is 



6E[dX t ] + \(6 2 V[dX t ] -Qdt) 



4 



and A = -1/2G. 

In practice, in a fund, gearing is done as follows. First, optimise and 
provide a backtest of the optimised strategy using, for example, G = $1M. 
This provides an estimate of the historical Sharpe ratio and P&L volatil- 
ity. Other strategies are treated similarly. At a higher level, the portfolio 
manager uses this information, and the correlations between strategies, to 
determine an 'allocation' to each, or, in effect, a level of P&L volatility for 
each to run: one then scales one's G to suit. Should all strategies have 
the same G, then the risk being run on each is directly proportional to 
its Sharpe. This assumes, as we are doing here, that transaction costs are 
proportional and there is no degradation in Sharpe as more risk is run. 

Summary of |9j 

The work in [9], and here, is a departure from current literature such as 
ttU 021 E] in a few respects. First, we optimise an objective function that 
has an infinite horizon, not a finite one, and is dependent not on terminal 
utility but on quadratic variation (in a more general setting, utility of P&L 
variation). Hence the upper limit in (|3|4p is infinite, and the value function 
satisfies an ordinary differential equation (ODE) rather than a PDE. This 
enables the value function to be written down easily as a function of the 
DT-NT-DT geometry, and then optimised. 

Another important distinction is that risk is taken synthetically here. 
Whereas in the Merton problem there is a riskless asset and a risky asset 
(e.g. a stock), here we take risk through a futures position, trading one asset 
only. This means that we have different boundary conditions; in fact we 
theoretically permit unlimited loss but in practice the probability of this is 
too small to be significant. 

The main result of [9] is that the optimal NT boundary can be deter- 
mined exactly, more or less in closed form through a pair of coupled nonlin- 
ear (but not differential) equations. Using Taylor series expansion, one can 
study this solution in the limit of small transaction costqfl, and finds that 
the half-width of the NT zone (buffer) is, expressed as an amount of asset 

8 This is rigorously justified using the (Analytic) Implicit Function Theorem. The 
algebra is lengthy, but not conceptually difficult, and the problem is much more tractable 
than having to deal with the PDEs that arise in other authors' work. One does not have 
to guess that an expansion in powers of e 1 / 3 is necessary: it is a immediate consequence 
of the Taylor series expansion. More details are in the Appendix and in the full version 
of 0. 
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to be tradec 3 
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„_ ( *g&*)!!)* +0(e)i (7) 



and the displacement of the buffer from the costfree position is 

where e is the cost of buying or selling one lot of X. (N.B. Do not confuse 
50 with dO.) Both expressions are directly proportional to the gearing G, as 
expected, because go contains a factor of G also. The cube-root dependence 
of 50 on cost is a well-known result. Higher powers go up in steps of e 2//3 . In 
single-period models, or situations in which one trades into a position and 
then holds it indefinitely, there is no 0(e 1 ^ 3 ) term and the behaviour is O(e): 
thus the 0(e 1//3 ) behaviour arises from continuous trading. Informally, the 
constant of proportionality deals with the variability of the target position: 
the more variable, the wider the buffer needs to be. 

As our setup is not the same as that of other authors, the reader needs 
to exercise caution about comparing the results shown here with others: 
whereas there are clear similarities, such as the e 1//3 dependence, and (less 
importantly) the | factor, the remaining terms are not quite the same. The 
interpretation of \g' (X t )\ 2 as a trading speed is important and it is absent 
from the basic Merton problem. In the Merton problem one rebalances 
stock and riskless asset so as to achieve a constant proportion, which we 
regard to be of little practical relevance. In this paper, rebalancing occurs 
because the factors, and hence the view of the profitability of the asset, and 
hence the desired trading position, change: in our view this is a much more 
satisfactory model of the way that investment management works. 



We can think of the buffer in 'position space', as in the present description, or in 
'factor space', i.e. the amount the factor(s) must move to trigger a trade. We shall in due 
course need both ideas. 

10 Result for the halfwidth is corroborated, using entirely different methods, in the case 
where X follows an Ornstein-Uhlenbeck process, by Bouchaud & co-workers in [7]. 
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Outline and summary of this paper 

The derivation is given in the Appendix and consists in reworking the deriva- 
tions of [9]. The result is that the halfwidth i j^l 

f,2 _ Vt[dgo(Z t )] 

r ° " V t [dX t ] (9) 




and the displacement is 



B t [dg (Z t )} (2e 2 G 2 



1/3 



M ~^mt[w) ■ <io) 

We now discuss these results in turn and present some heuristic justifications 
for them. 

For the half-width, we have reinterpreted 15 in ([7]) as the ratio of the 
variation of the target position to the variation of the tradable asset: 

\£ffr\\2 ... Vt[dgo{Z t )} _ o\ 

l9 ° mi ~* v t [dx t ] 

In effect, what we have done is reinterpret \dgo/dX\ 2 as (dgo) 2 /(dX) 2 and 
taken expectations. Clearly something had to be done about the |^q(X 4 )| 2 
term in ([7]), for whereas in [9] the target position depended only on the 
tradable, here it depends on the factors, so g' (X t ) is not meaningful in this 
context. In the case of [9], the two expressions in (llip coincide. However, 
(fTUj) cannot be obtained by simply replacing |<7oPQ)|- Note also that the 
displacement is 0(e 2//3 ); it is therefore of less importance when costs are 
small. 

A few obvious points can be made about (|9I10|) . As expected the RHS is 
directly proportional to G because go is, so the effect of G is not interesting 
(and does not affect, for example, the trading speed of the model). It can 
also be seen that the buffer is wider when the attempted trading speed is 
higher, i.e. 1q is higher. The dependence on the volatility ax of the traded 
asset is more subtle. There is an explicit factor of ax in the expression 
for Tq, and other factors in the expression for go. The effect is to reduce 
the buffer width and also the target position as ax increases; informally, 
the buffer width remains roughly the same as a proportion of the 'typical' 
target position, an issue that will become clearer when we make explicit 
calculations later on. 



V t denotes the variance, conditional on information known at time t. 
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The results can be justified by scaling arguments^ : a fuller discussion is 
in [10] . The reduction in expected utility consequent on trading to the edge 
of a buffer of width 59, rather than to the target position itself, is oc 59 2 , 
as the utility is quadratic at its maximum. More precisely let U denote the 
rate of accumulation of expected utility in (|3|4|) . i.e. 

U(z, 9) = jE t [U(9 dX t ) | Z t = z] = f, x (z)9 - ° X ^f \ 

then the reduction is \59 2 d 2 XJ which is equal to —a x 59 2 /2G. The time 
taken to exit a buffer of width 59 scales as 59 2 divided by the square of the 
volatility of the target position, i.e. 59 2 /c| - The cost associated with this is 
e 59. Adding the two parts together gives the total utility loss rate through 
suboptimal positioning and through explicit payment of transaction costs. 
We are therefore to maximise 

~^59 2 



2G 59 2 /ka 2 



for some positive constant k, from which it is easily seen that the maximum 
occurs when 59 is given by the expression in ([9]). That k = | comes from 
comparison with ([7]). 

An extension of this argument gives a result for the displacement, as 
follows. Suppose that the drift /j,g Q in target position is positive, and that 
we currently have too high a position on, so that 9~[ exceeds the target 
position. If we sell now, it is likely that we will be undoing the trade as 
the target position drifts back up, thereby resulting in an extra dose of 
transaction cost. The extra cost is edO and the time taken to drift towards 
the edge of the buffer is of order 59/ ^ . The edges of the buffer are at 
(59 ± d9) 2 of which the average is 59 2 + dO 2 so we insert this into the first 
term in the objective function (the penalty for having a suboptimal position 
on). The revised objective function is 

a ^,M 2 +H^ e59 i £d9 

-2G-w +de) -wjk^ o + ^jk^: 

which gives, on maximising w.r.t. dO, 

a x-d0 - 



G 50/k'llg o 



2 If one is prepared to take the numerical factors on trust. 
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for some other positive constant k' . Consistency with (JSj) requires k' = 1 
and we deduce 

S0d9 = ^feG (12) 
a x 

which in conjunction with ([9]) gives (|10p . The revision to the objective 
function does not vitiate the result we just derived for 50 (at leading order). 

The neatness of ([9]) is mathematically appealing and the formula is very 
easy to implement; an exact solution is very difficult to derive, though it 
was done in [9] for one factor Z% = X. We regard (fTUj) as less important: it 
is a smaller effect and the position-drift term is rather harder to estimate, 
so in our numerical work we centre the buffer around the target position, 
implicitly assuming that d9 = 0. 

The purpose of the next sections is to demonstrate the buffering rule 
© using a variety of models of differing complexity to assess its empirical 
validity. We first do this in a 'controlled experiment' using synthesised data 
from a model in which all parameters are known exactly, and find that it 
works well. Then we fit a momentum model to some real financial data, on 
which the underlying model and parameterisation can only be estimated, 
and find that it still works reasonably well. 



2 Examples using synthesised data 
Objective 

We first consider some models with synthesised data that will incorporate 
effects seen in real time series. There are advantages to this: we can separate 
out different effects in a controlled experiment, and we can simulate as much 
data as we please. We could simulate the various Brownian motions driving 
the factors, compute the positions and P&L net of transaction costs and 
hence the (discounted) utility, then repeat many times and average so as to 
obtain an estimate of the value function. However, a useful short-cut is to 
exploit the ergodicity of the models and replace a realisation-average with 
a time-average. Thus we simulate instead only one trajectory, for a long 
time period (and the discount factor can safely be ignored^!) . Explicitly 
the following quantity, which we call the empirical value function, is being 

13 It is absent from the approximate solution for the optimal buffer width and displace- 
ment. 
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maximised: 



N-1 

Fcmp = E U i e u(X U+1 - X u )) - e\9 u+1 -6 ti \. 

i=Q 

In discrete time it will be necessary to specify an explicit utility function, 
rather than just its first two derivatives at the origin. We have used U(x) = 
(1 — e~ x l G )G throughout. By the (empirical) account curve, we mean the 
time series of integrated P&L after costs, i.e. 

n-1 

£ u - x ^ ) - e l i < n < iv. 

i=0 

Analogously with real data we do exactly the same thing. Of course, both 
V emp and the account curve are random variables, but the idea is that the 
implicit randomness is attenuated by observing for long enough time. Also, 
plotting a single time series and account curve makes for more convenient 
interpretation and illustration. 

We wish to demonstrate whether the approximate buffering rule ([9]) is 
optimal, but this presents us with a difficulty, as in principle we must test 
against all other buffering schemes. Another difficulty is that the buffer 
width is in general varying, so we cannot simply plot the buffer width against 
the empirical value function, as there is no unique buffer width to plot. What 
we can do easily, though, is multiply 59 in eq. Q by some fixed amount A; we 
then plot the time-average of the buffer width on the horizontal axis, and on 
the vertical axis the empirical value function. Repeating for different values 
of A causes a curve to be described, and we highlight the point corresponding 
to A = 1. Finally, we repeat for different transaction cost parameters to give 
a family of curves. Consider what the curve should look like, as a function 
of A. If the buffer width is too small (A —> 0) then the value function will 
drop (in continuous time it would drop to — oo): the drop will be severe if 
e is high. If the buffer is too wide (A — > oo) the NT zone will become so 
large that no trading takes place, and the value will tend to zero. At some 
intermediate point there should be a maximum; ideally the highlighted point 
(A = 1) will be at the hump, indicating that no improvement can be made 
by scaling ([9]) up or down by a fixed amount (though it does not rule out the 
possibility that the buffer is suboptimal by virtue of being at some times too 
wide and at other times too narrow). However, if the costs are high enough, 
the value function will always be negative and there will be no hump: then 
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the strategy is worthless, irrespective of how well 'optimised' the buffer iJn. 
Note that throughout we ignore the buffer displacement in our simulations, 
assuming it to be zero. 

Linear model 

We say that the model is linear, if \ix depends linearly on Z and ax is 
constant, with Z a multivariate OU process. The simplest example is the 
one-factor model, 

dX t = PaZ ltt dt + adW 0:t (13) 
dZij = —K\Z\ t t dt + \/1k\ dWij- 

This is a simple momentum model, parameterised by (3 (the strength of 
trending), a (the volatility of the traded instrument), K\ (the rate of switch- 
ing of trend) and poi = Corr(dWo,t, dWi t t), the correlation between changes 
in the factor and the tradable. One simulates Wo,* an d Wi t first an d from 
that Zn and thence Xt. It is easily seen that 

PZ U G + 2 _2(3 2 kiG\ 



g (Z t ) = ■ — ; T 



a 

hence 



w' 3 ^ /3^y/* (()2)1/2 



0-1/317 " V 101 I 
where e = e/a is the cost per unit volatility of the tradable and (0 2 ) 1 / 2 



jt pei 

Gj3/a is the r.m.s. position 15 !. Notice that Tq and hence 59 are constant, and 
poi does not play a part. Notice also that the buffer width and position are 
both inversely proportional to a, provided one fixes e. (If the volatility of the 
underlying increases with e fixed, then the asset has actually become cheaper 
to trade and the buffer width drops as a fraction of the typical position.) 
Thus the only factors that link the buffer width to the r.m.s. target position 
are e 1//3 and an extra quantity «i/|/3| that has dimensions time -1 ' 2 ; this 
is necessary for dimensional agreement (because e has dimensions time 1//2 ) 



14 As an ansatz the reader might wish to plot — ew^ 1 + (l + w' 2 )~ 1 / 2 vs w, for < w < oo. 
For e < 1, the curve has a local maximum above zero at w obeying w = e 1,/3 (l + w 2 ) 1 '' 3 . 
But for e > 1, it has none and it is always negative. 

15 Root mean square. Note (Zi) = and (Zf) = 1. Do not confuse (O 2 ) 1 ^ 2 with (jg 
which denotes the volatility of the position and pertains to changes in position. We could 
also write (O 2 ) = Vg[#t] i.e. the variance of 9t given no information: by stationarity, these 
are the same. 
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and can be thought of as the trading speed, because the higher k\ is the 
more rapidly the momentum factor is changing direction. Finally, if ft — > 
then the buffer width becomes large as a fraction of the r.m.s. position (not 
in absolute terms because the r.m.s. position reduces too): the explanation 
for this is that the asset price has in effect become less predictable, or that 
the trading signal is of lower quality: as expected, therefore, the NT zone 
becomes relatively wide and cuts down the amount of trading. 
The displacement of the buffer is given by 

w) 

thereby reducing the magnitude of the position, compared with the target 
position, by the indicated factor. In the context of the examples we are 
about to show, this is typically a few percent, which justifies our decision to 
ignore it. 

Figure QJa) shows results with ki = 0.02, ft = 0.2, a = 0.5, poi = 0, 
for transaction cost e = 0.02, 0.05, 0.1, 0.2, 0.5. The gearing is fixed at 
G = 1$, and the position 0t and the buffer size are notional allocations to 
Xt which here can be fractional 1 ^!. Note that the dimensions of K\ , ft, a are 
respectively time -1 , time -1 / 2 , $/time 1//2 , where time is in the same units 
as the horizontal axis (which might be thought of as business days). The 
appearance of the graphs is as expected and the postulated rule © appears 
to be optimal. For low costs the impact of getting the buffer wrong is quite 
small, but for high costs it is much bigger. 

Nonlinear coupling between tradable and factor 

We can replace X's drift with a nonlinear function of the factor (s): 

dX t = pa<y(Z ltt )dt + *dWo, t . (14) 

Clearly 

-> ftj(Z ljt )G f2 2ft 2 j{Z u f^G 2 
9o{^t) = ; i o = i • 

Figure QJb) is a repeat of Figure QJa), with j(z) = tanh(2z) (this choice 
is arbitrary and inspired by the use of such sigmoidal functions in neural net- 
work theory, where they are referred to as activation functions, see e.g. [1]). 
Again the postulated rule © appears close to optimal. 

16 So for example a position 9 = +2.157 is permitted. When we do real contracts we will 
work with the proper contract sizes, so then one unit of X will be 'small', so one might 
in context be trading 2157 lots each of size 0.001. 
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Stochastic volatility 



We can introduce a volatility factor by exponentiating an OU process and 
using that in place of ax, as follows. With one return factor (Z\) and one 
volatility factor (Z v ), one has: 

dX t = P<T t Z ljt dt + atdW 0: t (15) 

a t = ~a exp (rjZ V)t - \rj 2 ) 

dZi tt = -KiZi t tdb + y/2i^dWi tt 

dZ v>t = -k v Z v j dt + y/2n v dW Vi t 

where in addition to the linear model rj is the relative amount by which 
the instantaneous volatility at varies, k v is the rate at which it reverts, and 
Piv = CorrfrfW w, dW<j.t) controls the extent to which volatility is correlated 



Hi 

with asset pricq_|. Then 

-. PZ U G f2 2(3 2 Kl G 2 8PvVw*Piv9o&)G , 8rj 2 K v g (Z t ) 2 

9o{^t) ; J-o- 4 1 3 1 2 

a t a? af af 

note again that poi and po v do not enter. 

Figure QJc) repeats the linear model of Figure [UJa), with additional pa- 
rameters T) = 0.4, k v = 0.005 (and a stays at 0.5). One can also combine the 
nonlinear model of Figure [2(b) with stochastic volatility: see Figure QJd). 



Multiple factors 

We can easily introduce further factors alongside Z\, thereby modelling mul- 
tiple predictors: 

dX t = {Pili{Z u ) + p 2l2 {Z 2>t ))a t dt + atdW ,t 
dZ ijt = -KiZ iyt dt + y/2K~idW i: t i = l,2 (16) 

and at, Z v j as above. We set j3i = 0.1 and 7i(x) = tanh(2x), but have the 
k's different: k\ = 0.02 and k 2 = 0.005, so that the first factor reverts four 
times as rapidly as the second. The correlation between dW\j and dW 2 ,t is 
P\2 = 0.5; both are uncorrected with dWo,t and dW v ,t- 

The expression for Tq is now cumbersome as it depends on all the various 
drifts, volatilities and inter- factor correlations. We therefore use a simpler 
idea that we shall reuse later. It is easy enough to estimate Iq by forming 
rolling historical estimates of the quadratic variation of 0t and of Xt, from 

17 In stock markets, for example, this is usually negative. 
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the simulated time series, and taking their quotient. Using this method, we 
obtain Figure QJe), and again the whole scheme seems to work quite well. 

Consistently, the examples in Figure Q] show that the theoretical buffering 
rule generates the highest empirical value, at least by comparison with 
rules that are scaled up or down by some constant factor. The suggestion is 
that one need not plot the graphs of value vs buffer width and search for the 
maximum by hand, but instead trust that eq. ([9]) is a universal law thereby 
saving considerable effort. 

To round off this section we give a view of the various parts of the model 
with nonlinear coupling and stochastic volatility. Figure [2)^ a) shows the time 
series of the factors Zi^ t and the volatility multiplier exp(r]Z v> t — ^f? 2 ), and 
Figure [2{b) shows the time series of the tradable Xt- Figure E{c) shows the 
position Ot when the costs are given by e = 0.2 and using the theoretically 
optimal buffer (which as can be seen from Figure QJd) has average halfwidth 
« 0.3) and also the corresponding account curve net of costs. 

3 Examples using real data 
Model construction 

We return to the previously- mentioned issue of exogenous vs. endogenous 
factors. The models of the previous section lend themselves more readily to 
interpretation as exogenous factors, whose dynamics and interdependence 
are explicitly known. The models in this section are going to be of the 
price-driven, endogenous, type. This is important because in constructing 
price-technical models one does not — and does not want to — write down a 
complete model for the dynamics and interdependence of the various trading 
signals. Rather, one identifies the signals from the time series using some 
recipe, and some combination of these (let us assume linear) is used to form a 
prediction and hence an optimal 'target' position: this is completely specified 
by the signals and empirically-determined signal weights. For buffering (i.e. 
to get Tq in ©), one additionally needs to know the volatility of the target 
position. The practical solution, as anticipated, is to estimate it empirically 
using an historical volatility estimate from previous days' trading (in a live 
system) or simulated trading (in a simulated system). 

A common trending indicator is a weighted average of past returns, 



For example, the difference between two moving averages of prices, wherein 
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the fast average exceeding the slow average is an indicator of an up-trend I. 
can be written in this form. If X-t is an arithmetic Brownian motion of 
volatility ax then the process JC[X\t is stationary with variance J °° K(t) 2 dr- 
a x , provided the integral is convergent^. This allows a normalised momen- 
tum signal to be constructed and coupled into the dynamics of the traded 
asset Xt in the same way as in the synthetic examples, 

dX t = (3 j(Z t )a t dt + a t dW ,t, 

thereby giving a target position 6 t = j3 j(Z t )G/at- It is necessary to es- 
timate (3 from the data either by regression or, more consistently with the 
approach of maximising ([3]) , by directly maximising the empirical value func- 
tion w.r.t. /3; the two methods are very similar. One can have multiple 
momentum factors, of different speeds, by using K 7 s of different decay-rate, 
thereby giving a model like (|16l) . 



Results 

We have applied the model to a variety of different futures markets, of 
which four are shown here from different asset classes: bonds, energies, 
agriculturals and the CBOE VIX contract. In each case the time series of the 
traded asset Xt is given by stitching together the time series of the individual 
futures contracts, rolling 10 days before they expir^|: see Figure [3l Each 
of these is assumed to exhibit trending to some extent, which should result 
in value generation. The gearing factor of each strategy is set to G = $1M 
and the resulting account curves are also shown in Figure El suggesting that 
the trending property is reasonably exploitable. 

The effect of buffering on transaction costs is shown in Figure HI The 
buffer size is now a number of contracts, and the costs are in contract 
points so that e = 0.005 corresponds to a market ^point wide, such as 
89.16/89.17. Although the curves do not have the same 'ideal' shape as 
they do in the synthesised examples, it is reasonably clear that the theoret- 
ical optimum is reasonably optimal in practice too. 

Notice for some of the contracts that trading generates almost no utility 
for high transaction costs (though in context, the highest costs used are 
much more severe than those that would be incurred in practice for these 

18 A very commonly used device discussed for example in [5| §9] and also in many online 
articles on technical analysis, e.g. www.stockcharts.com/school 

19 For background to this result and related issues see, for example, 8 . 

20 This can be done automatically in Bloomberg (GFUT <Go>). All these contracts are 
designated as <Comdty> in Bloomberg, except VIX which is UX1 <Index>. 
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contracts). Occasionally it is seen that when the buffer is wide, the value 
might not quite decrease as the transaction cost increases. This is because 
of the hysteresis introduced by the buffer, keeping the strategy stuck in the 
same position for a long time (in context, years); whether it makes money or 
not is then a matter of chance that would be averaged out if more simulation 
data were available. 

Notice also that for low transaction costs, the empirical value function 
does not go to — oo in the limit of no buffering. This is because the simula- 
tions are being done in discrete time, an issue requiring further research. 

4 Conclusions 

We have demonstrated a rule © for the optimal buffer, or NT, width to 
be applied to a diffusive factor model in the presence of proportional trans- 
action costs and it seems to work well. For low costa^] it seems to slightly 
overestimate the optimal width in the 'real' examples we showed, and we 
think this is due in part to the time discretisation in the simulation (the 
theory is continuous-time). We have also derived the displacement of the 
buffer from the costfree position, but for the reasons stated here we regard 
it as unimportant and we have not used it in our demonstrations. 

Clearly it is important to know whether a strategy can make money 
after costs, even if it is profitable in theory. This is particularly true of 
mean-reverting or relative- value strategies, where transaction costs tend to 
be a higher proportion of the P&L than for momentum strategies. Knowing 
how to correctly buffer a strategy is important when the transaction cost 
is high, as we have seen. If, despite optimising the model parameters and 
incorporating the buffer rule, the strategy's expected utility is still negative 
(which will be seen in simulation), then one knows to avoid it. 

Another way of avoiding strategies that cannot reasonably work under 
transaction costs is to look at the typical buffer size as a proportion of the 
typical trading position. Once this ratio gets too large, the trading model 
exhibits too much hysteresis, getting stuck in the same position for perhaps 
months or years, and is effectively inoperable. 
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21 Strictly, this means lower transaction cost per unit volatility 
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Figure 1: Empirical value function vs buffer size for different models and costs. 
Models: (a) linear model, (b) model with nonlinear coupling, (c) linear model with 
stochastic volatility, (d) nonlinear coupling and stochastic volatility, (e) model with 
two prediction factors and using rolling estimation of Tg. Cost multiplier (e) as 
stated on graphs; • marks theoretical optimum @. 
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Figure 2: For model with nonlinear dependence and stochastic volatility, this shows 
(a) the factors (return factors Z\ t t, Z<i,t in green/cyan, and volatility a exp(rjZ v t — 
\r] 2 ) in red), (b) the tradable asset X t , (c) the position taken and account curve 
net of costs. 
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Figure 3: (Top) Time series for rolling front contracts of each of four futures, 
labelled with their Bloomberg tickers: US 10Y Treasury bond (TY1), crude oil 
(CL1), rough rice (RR1), and VIX volatility index (UX1). 
(Bottom) Account curves before costs, for simple momentum strategy. 
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Figure 4: Empirical value function vs buffer size for real examples. Cost multipliers 
(e) are as stated on the graphs and are in contract points (not dollars). Buffer size 
means the number of contracts. 
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5 Appendix: Derivation of ([9]) and (1101) 




Figure 5: When a smooth surface is slightly deformed, the deformation can at any 
point be understood as a movement in the normal direction, which typically varies 
from point to point (arrowed). 

The key idea in the derivation is the reduction to one dimension, an 
issue that we justify now. In the case of two factors the dependence of 
target position on the factors, i.e. 6 = go(Z\, Z2), is easily visualised in 
three dimensions as a smooth surface, with 9 upwards (see Figure in a 
higher number of dimensions the derivation still works, but is harder to 
visualise). The creation of the NT region involves, effectively, the creation 
of two copies of the surface, one placed a little above, the other a little below 
(with a little deformation being applied, possibly, as the spacing may not be 
the same everywhere). Let P be a point on a smooth surface S, and let p 
be the normal to S at P. Consider what happens when S is moved a small 
amount^. Motions in the plane of S, i.e. the two directions perpendicular 
to p, have no effect: it is only movement in the normal direction that does 
anything. The same principle holds in any number of dimensions. Thus to 
work out the width of the NT zone, for small costs (when it will be small), 

22 Later on, in Figure [6] we show the projection of this setup on to the (Zi, Z2) plane. 
Under this projection, the vector p becomes what is marked as ft on the diagram. 
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we only need to look in that normal direction, obtainable directly from Vgo- 
Consequently, all the machinery of [9j can be invoked. 

Focus on the (hyper)plane 9 = 9*, pick a point z on the optimal costfree 
surfacd^l So : g~o(Z) = 9*, and in the vicinity of that point let the NT 
zone boundaries be given locally by S + : go(Z) = 9* + 59 + and 5_ : go(Z) = 
9* — 59-. This is shown in Figure[6]for dim Z = 2, but the construction works 
in any dimension. Let n be a unit vector in the direction V<?o(-?), which must 
be normal to So. Let the normal cut Sq at Z = zq, S + at z + = zq + n$C+ 
and <S_ at z_ = Zq — fi5Q_, and call the line segment between these last two 
points 6£. We focus on variation of the value function along 51. 



Vgo , n 



5£ : Z = zo + (n 
Ce[-5C-,6C+] 




S + : g (Z) = 9* + 66 A 
Z x 

So : g (Z) = 9* 



S- ■ go{z 



Figure 6: Section through the discrete-trade and no-trade zones, for a two- 
dimensional factor model. All points in this plane correspond to the same position 
(6* say) in the traded asset; the dashed line is the costfree case. 

Write C for the infinitesimal generator of the diffusion of Z, i.e. 



dZi 



d 2 V 
lzJ dZidZj 



where the covariance matrix H is given by Hij dt = Et [dZn dZjt] . 

In the NT zone there is no trading so Vt = f(Z t ,9) evolves according to 



-r + C)f(Z,i 



-U(Z,9) 



(17) 



Curve when dimZ = 2 as illustrated. 
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where we have employed the usual Ito or Feynman-Kac argument and 



U(z,9) = jB t [U(8dX t ) | Z t = z] = f, x (z)9 - 

is the rate of accumulation of expected utility. However, we are only in- 
terested in variation along SI, and this is important because (|1T[) thereby 
reduces to an ordinary differential equation, with £ the coordinate in the 
n-direction: 

(-r + £ H )f((,e) = -U((,e) (18) 

with 

£ n[f] =^12 ^ + Yl n i H V n J ^ ^2 ( 19 ) 

denoting the 'restriction' of C to the n-direction (n^ denotes the ith compo- 
nent of n) . 

In the DT zone an instantaneous rebalancing is performed, so the market 
does not have time to move, and the value function is obtained by deducting 
the cost of transacting towards the NT boundary. Hence at the boundary, 
on the DT side, we have 

We take for granted that the 0-derivative is continuous at the boundanF^l. 
Equations ()18|20p define a two-point boundary- value problem whose solution 
we wish to maximise w.r.t. the boundary location. This is precisely the 
problem solved in [9] with minor alterations, as follows. In (|18p there are 
two volatilities: the one on the LHS, labelled 'a± in ()19p . and ax which 
occurs in U on the RHS. In [9], both are just ax, so we have to be a little 
careful. Also, if in the DT zone outside the boundary marked S + , at ( = 5( + , 
one must buy the asset, but in the setup of [9], in the equivalent place, one 
sells. This necessitates altering a few signs. 

For clarity we quickly run through the argument of [9]. We first work 
out what is going on in the NT zone. It is known that the equation 
(— r + Ca)f = has two strictly positive solutions / = C + ,C_ that are 
respectively increasing and decreasing functions. Let K(£,£) be the Green's 
function, that is, the solution to (— r + Cf t )f = — 5(C — £); ^ s wm be posi- 
tive everywhere. By standard construction of solutions to linear ODEs, the 



4 This is discussed in the extended version of 9 
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solution to (|17p . now written f(C,9) rather than f(Z,9) as we only care 
about variation in the n-direction, is of the shape 

/oo 
{/(£, 9)K(C,0 di + a + {9)C + {Q + a_(0)C_(C) (21) 
-oo 

i.e. 'particular solution plus some multiple of the complementary funcion(s)'. 
Invoking the boundary conditions (|20p we have two equations, one with 
C = h + {9) at the buy boundary and one with £ = h-(9) at the sell boundary. 
These give 

, , , I(h-,6)C-(h+)-I(h+,6)C-(h-) + e + C-(h-) + e-C-(h + ) 
a+[ > C + (/i+)C_(/i_) - C + {h.)C^(h + ) 

, m I(h+, 9)C+(h-) - I(h-,6)C + (h + ) - e+C + (M - e-C + {h + ) 
~ {) C + (MC-(L)-C + (L)C-(M 

(22) 

where for clarity we have abbreviated h±(6) to h±, and 

/oo 
(d 2 U)(i, 9)K((, d£; (-r + C R )I = -d 2 U. (23) 
-oo 

We now wish to maximise the part of pip that is sensitive to the boundary 
position, and this necessitates maximising a+(#) or a_(#), as the first part 
is insensitive. As the same boundary specification must maximise the value 
function at all possible points in factor space (i.e. all £) simultaneously, we 
can choose to maximise either a+ or a_ and the results should, and indeed 
do, give the same answer. Furthermore, the same boundary specification 
must also maximise the value function at all points in position space (i.e. all 
9) simultaneously, so it is sufficient simply to maximise a^_. This means that 
we maximise either of the quantities in (I22p . and so differentiate w.r.t. h + 
and h~. The result is a pair of coupled nonlinear equations in h + ,h_. The 
difference between these equations gives one result, pertaining to the width 
of the NT zone; the sum gives a different one, pertaining to the displacement 
from the costfree position. We then expand these in a Taylor series and 
equate terms of equal order, so that h+(6) = ( + 5C+ an d h-(9) = £ — <5C- 
with the 5 terms small. Define 

Wij = c®c® - c®cf 

where superscripts denote derivatives; this is to be understood as a 
function of £. 
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For the difference, and hence the buffer width, an expression of the 
following form emerged^!: 

eWi, - i(W 3) i - W 3fi di + Wi, 8%)l{C,9) ■ 5( 3 = 0(e5(; 2 ,5( 5 ) 

with 5( = |(#C+ + ^C-)- As described in [9], all expressions of the form 
Wij/Wk,i relate directly to the coefficients of the ODE (— r + £-ti)f = of 
which C± are the roots (q.v.. [T9|): this means that one need not know what 
C±(C) actually are. The second part of (|23|) then allows further simplifica- 
tion. 

By the Implicit Function Theorem, the optimal half-width in the in- 
direction is at leading order, from [9], 



s( = Us(+ + *C-) ~ ( — ^ — ) 

2 V2(9 1 9 2 C/)(C,0)y 



1/3 



Now go(Z) is the value of 9 that maximises U(Z, 6), so (d^U) (Z, go(Z)) = 
for all Z. Differentiating, 

(Vi^J7)(Z,ffo(^)) + (d 2 U){Z,g (Z))Vg (Z) = 0, 

with Vi indicating a (vector) derivative w.r.t. the first argument Z. Taking 
the scalar product with n and noting that the derivative in the n-direction, 
which we denote d^, is simply n ■ V, we have 

(d c d 2 U)(Z,g (Z)) = -\Vg (Z)\(d 2 U){Z,g (Z)) = \V g^Z)\a 2 x G-\ 

the last step following from the definition of U. Hence 

/ r, \ 1/3 

( 24) 

(this is independent of G, as go(Z) oc G). Multiplying by |^q(Z)| gives the 
half-width in the ^-direction: 



25 On the LHS, even powers of <5£ vanish by symmetry arguments, and the <5£ term 
also vanishes. Thus the algebra is laborious, as one must differentiate an already messy 
expression several times. 
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But 

V t [dg (Z t )} = {Vgo(Z)) l H lJ {Vg (Z)) J dt = a 2 ± \Vg (Z)\ 2 dt, (25) 

i,3 

the last step by (JT9J) , so we find 

1/3 

'ill] 

\ 2Vt[dX t ] 



as contended. In the OU case dX t = —bX t dt + adW t , we have go(X) = 
-bGX/a 2 , so f 2 = b 2 G 2 /a 4 and 

59 ~ (3e6 2 /2^ 4 ) 1/3 G, 

as previously obtained in [9]. 

We now turn to the displacement of the centre of the NT zone from 
the costfree case. Formally we mean the following: for any Z, the optimal 
costfree positon is 9 = go(Z) and the boundaries of the NT zone (obtained 
my moving in the ^-direction keeping Z fixed) are at 

9 = g (Z)±59 + d9 

where 59 is the halfwidth as previously obtained and we call d9 the dis- 
placement. Taylor analysis (of the sum of the nonlinear equations previously 
mentioned) gives an expression of the form 

s5( w 2fi + (w 2 ,i - w 2fi d 1 + wi, d%)i((,e) ■ 5( 2 = o(^cVc 4 ) 

from which (as the expression in front of /(£, 9) is, up to a factor, the 
differential operator (— r + Ca)) we find 

Now (— r + Cft)I = —d 2 U, and (d 2 U) (Z, yo{Z)) for all Z by optimality, so 
evaluating at 9 = go(Z) + d9, i.e. the midpoint of the NT zone, we can 
approximate 

(d 2 U)(Z,g (Z) + d9) ~ (diil)(Z,9) ■ d9 = -a 2 x G-\l9. 
Thus, as W 2 fi/Wifl = —fJ,±/^Oj_ directly from the ODE, 

~ 5(d9a 2 x G~ 1 
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and so 
But 

E t [dg (Z t )) = (Vffo(^))i^ = »x\Vgo(Z)\ dt, 

i 

so using (|24p for <5£ and recalling (|25p and the definition of Tq we have 

d ^ E,fe(f,)] h_i^\'\ (27) 

V,[d.Y t ] ^ 3fg ^ ( ' 

In the OU case the factor on the front works out as —8b /a 2 . One then has 
the simple result 

d6<~ -9- (2e 2 b/3a 2 ) 1/3 , 
as previously obtained in [9]. 
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